18. Data Quality

15 Data Quality -

Examples of Data Quality Requirements

  • Data must be a certain size
  • Data must be accurate to some margin of error
  • Data must arrive within a given timeframe from the start of execution
  • Pipelines must run on a particular schedule
  • Data must not contain any sensitive information

Data Quality Requirements

Which of the following are true about data quality requirements?

SOLUTION:
  • Requirements are how we can set and measure quality
  • Requirements allow both engineering and non-engineering roles to agree on the high-level method for preparing the output.
  • Requirements tell engineers what the output of their data pipelines should be

SLA Quiz

How would you set a requirement for ensuring that data arrives within a certain timeframe of a DAG starting?

SOLUTION: Use a Service Level Agreement

No data

What kind of requirement would be violated if no data was produced by a DAG?

SOLUTION: Data must be of a certain size

Data arrival

What kind of requirement would be violated if data arrived after it was needed?

SOLUTION: Data must arrive within a given timeframe from the start of execution